Predicting diabetes-related conditions in need of intervention: Lolland-Falster Health Study, Denmark

Highlights • Approximately 10% of adults in Denmark have prediabetes, undiagnosed, or poorly or potentially sub-regulated diabetes.• The presence of prediabetes, undiagnosed, or poorly or potentially sub-regulated diabetes can be predicted reasonably well.• The prediction model estimated in this study might be a useful screening tool.


Introduction
In newer Danish population-based health studies, 8.3%, 11.9%, and 12.4%, respectively, of adult participants had either known diabetes, undiagnosed diabetes or prediabetes as measured from self-reported data on diagnosis and medication, and level of glycated hemoglobin (HbA1c) in blood samples (Jørgensen et al., 2020;Bruun-Rasmussen et al., 2020). While persons with well-regulated diabetes are taken care of, persons with undiagnosed diabetes or prediabetes, as well as persons with poorly controlled or potentially sub-regulated diabetes need intervention from the healthcare system to prevent possible longterm complications. Roughly, 10% of adult Danes suffer from prediabetes, undiagnosed, or poorly controlled/sub-optimally regulated diabetes. As a common term for these conditions, we use "diabetes mellitus related conditions in need of intervention" (DMRC) (Bruun-Rasmussen et al., 2020).
Public health actions focusing on identification, treatment and follow-up are essential for reversing and reducing the human, social and economic consequences of diabetes. This requires easily available information on how to identify potential persons with diabetes at an early Abbreviations: AUC, Area under the Curve; LOFUS, The Lolland-Falster Health Study; HbA1c, Glycated hemoglobin; BMI, Body mass index; DMRC, Diabetes mellitus related conditions in need of intervention; SOCIO13, Socio-Economic Classification.
stage. There are two types of diabetes prediction models. First, models for identification of risk factors for future incident cases of diabetes; second models for targeted screening for prevalent cases of diabetes (Asgari et al., 2021). Given that we included in our outcome measure both poorly regulated diabetes and prediabetes, our model was in a way a mixture. However, as our focus was on already existing cases of DMRC, our model can best be considered as belonging to the second type of models.
We aimed to determine to what extent data from population registers, self-administered questionnaires and non-invasive clinical assessments can predict presence of DMRC. We used data from the populationbased health survey in Lolland-Falster, a rural-provincial part of Denmark with a life expectancy below the national average and with health problems reported more frequently than for the rest of Denmark (Holmager et al., 2021).

Study population
Data were derived from the Lolland-Falster Health Study (LOFUS), a population-based survey undertaken in a rural-provincial area of Denmark in 2016-2020 . In this study, persons aged 18 years and above were randomly selected from the Central Population Register and invited with their household members of all ages to participate. Invited persons who agreed to participate completed webbased questionnaires prior to a clinical examination and provision of biological samples at one of three study centers. The overall participation rate was 36%. In an earlier study, the DMRC prevalence was investigated for a subset of LOFUS participants (Bruun-Rasmussen et al., 2020). For the present study, we included the entire LOFUS dataset of adult participants aged 20 years and above (n = 15,811) recruited from 2016 to 2020.

Population registers
From the Central Population Register, we used data on sex, age, citizenship, and marital status at the time of participation in LOFUS. Moreover, based on historical address data, LOFUS participants were divided into long-term residents and in-migrants. Long-term residents lived in Lolland-Falster for at least 10 consecutive years prior to the invitation date to LOFUS, while in-migrants had last moved into Lolland-Falster <10 years before the invitation date. From the Socio-Economic Classification (SOCIO13) in Statistics Denmark, we used data on socioeconomic status from 2017. For people of working age (30-64 years), we merged the socioeconomic data into economically self-supported persons and publicly supported persons on transfer income, while those below age 30 or above age 65 years were not classified by socioeconomic status.

Self-administered questionnaires
From the LOFUS questionnaires, we used data on educational levels divided into 'low' (≤9 years), 'medium' (10-11 years), and 'high' (12 + years). Data on frequency of intake of 5+ units of alcohol was based on a single question with four response categories; 'do not drink', 'rarely', 'monthly' and 'weekly/daily'. Smoking status was based also on a single question with three response categories: 'current', 'former' and 'never'. Current self-rated health was measured by one question with four response categories: 'very good', 'good', 'fair' and 'poor/very poor'. Selfassessment of general dietary habits was obtained from one question and categorized into three responses 'healthy', 'somewhat healthy' and 'unhealthy'. Leisure time physical activity during the past 12 months was measured by one question with three response categories: 1) 'low', mainly sedentary activities (TV-watching, reading), 2) 'moderate', light physical activities ≥ 4 h per week (walking, bicycling, light gardening), 3) "high", sports or other more vigorous activities ≥ 4 h per week (heavy gardening) or vigorous physical activity several times per week (heavy exercise or competitive sports).

Non-invasive clinical assessments
Data from clinical assessments included body mass index (BMI) based on measured height and weight at the clinical examination and calculated as weight in kilogram divided by height in meters squared (kg/m 2 ), and for descriptive purposes categorized into 'underweight' (<18.5), 'normal' (18.5-24.9), 'overweight' (25.0-29.9), and 'obese' (≥30.0). Systolic and diastolic blood pressure was measured and categorized into 'normal', 'high normal' and hypertension 'grade 1', '2', and '3' (Dansk Kardiologisk Selskab). Pulse rate was measured with an oximeter in beats per minute. Waist-hip ratio (WHR) was calculated by waist-circumference divided by hip circumference. For details see Supplementary Table 1.

Outcome variable
Non-fasting blood samples were collected at the three LOFUS study centers and analyzed at the Department of Clinical Biochemistry at Nykøbing Falster Hospital accredited by the standard ISO 15189. We used data on HbA1c. For participants with self-reported diabetes and/or self-reported use of antidiabetic medication including insulin and other antidiabetic medications, well-controlled diabetes was defined as those with HbA1c < 48 mmol/mol (<6.5%); potentially sub-regulated diabetes as those with HbA1c 48-59 mmol/mol (6.5-7.5%), and poorly controlled diabetes as those with HbA1c ≥ 60 mmol/mol (≥7.6%). For participants without self-reported diabetes and/or use of antidiabetic medication, no diabetes was defined as HbA1c < 42 mmol/mol (<6.0%); prediabetes as HbA1c 42-47 mmol/mol (6.0-6.5%); and undiagnosed diabetes as HbA1c ≥ 48 mmol/mol (≥6.5%) (Bruun-Rasmussen et al., 2020). It was not possible to distinguish between type 1 and type 2 diabetes. Participants with missing data on HbA1c, selfreported diagnosis of diabetes, and/or self-reported use of antidiabetic medication were labeled 'missing'.

Statistical analysis
We tabulated summary statistics of the data described above by diabetes-related diagnostic groups. Furthermore, as the purpose of the study was identification of predictors for DMRC, participants with prediabetes, or undiagnosed, poorly controlled, or potentially subcontrolled diabetes were merged into one group designated DMRC, and participants with no diabetes, well-regulated diabetes, or missing were collapsed into one comparison group.
In the analysis, sex, citizenship, marital status, migration status, socio-economic status, education, alcohol consumption, smoking status, self-rated health, dietary habits, physical activity and blood pressure were treated as categorical variables, while age, BMI, pulse rate and waist-to-hip ratio were treated as continuous variables. We performed univariate logistic regression with one explanatory variable at a time. We tested each explanatory variable for significance on a 5% level using likelihood-ratio tests. Explanatory variables found to be significantly associated with the binary response variable were included for further predictive analysis using a multiple logistic regression model.
The full LOFUS dataset of 15,811 participants was afterwards randomly split into a training dataset consisting of 70% of the participants and a test dataset with the remaining 30%. A full multiple logistic regression model with the candidate set of explanatory variables identified from the univariate analyses described above was run on the training dataset. The full model was reduced to include only statistically, significantly explanatory variables using a stepwise regression approach. The full and reduced models estimated on the training dataset were used to calculate predictions for the test dataset and a receiver operating characteristic curve was plotted to illustrate the results and the Area Under the Curve (AUC) was calculated.
The statistical analysis approach described above was applied to four sets of predictor variables: -Model 1: Variables from population registers; -Model 2: Variables available in population registers and selfadministered questionnaires; -Model 3: Variables available in population registers, selfadministered questionnaires and clinical assessment; -Model 4: Variables from model 3 with p-values < 0.001.
As a sensitivity analysis we divided the DMRC group into those with poorly-and potentially sub-controlled diabetes as one group, and those with prediabetes and undiagnosed diabetes as another group. The rational being that the first group was expected already to be in contact with the health care system concerning their diabetes, while this was not expected for the second group. Furthermore, we performed an additional sensitivity analysis in which individuals with missing information on diabetes status were excluded. All statistical analyses were performed in R ver. 4.1.0 (R Core Team, 2020). The analysis took place at the research server in Statistics Denmark.

Ethics approval and consent to participate
Participants provided written informed consent and the Region Zealand's Ethical Committee on Health Research (SJ-421) and the Danish Data Protection Agency (REG-24-2015) approved the study. In the case of abnormal laboratory results, the LOFUS participant was informed in a return of results-letter and advised to consult his/her general practitioner. All participants were able to check the results of Table 1 Number of LOFUS participants by variables from population registers and diabetes-related status.
(1) Normal Pot. = potentially. DMRC = diabetes mellitus related condition in need of intervention. SOCIO13 = persons aged 30-64 years earning their own income (self-support) or on public transfer income (public support). Long-term resident = lived in Lolland-Falster for at least 10 years before LOFUS invitation date.
In-migrant = moved to Lolland-Falster within the last 10 years before LOFUS invitation date. n: Number of participants. %: Percent DMRC out of total. Percent were not calculated for the 'Missing' category or for continuous variables.
their biochemical analyses on their electronic health records on https ://www.sundhed.dk.

Results
In total, 10% (n = 1575) of the adult LOFUS participants had DMRC, and by far the majority, 90% (n = 14236) of LOFUS participants in the comparison group had no diabetes, Table 1.
The proportion with DMRC was slightly higher in men, 11.5%, than in women 8.9%, and it was higher in Danish citizens, 10.1%, than in citizens of other countries, 5.5%. As expected, DMRC was most frequent in LOFUS participants above the age of 60 years, 14-15%, and the DMRC proportion was more than doubled among 30-64 years old, publicly supported persons, 11.5%, compared with 30-64 years old, selfsupported persons, 6.5%.
In the LOFUS dataset, data on alcohol consumption were missing for 16% of participants. Amongst the consumption groups, DMRC was most common in non-drinkers, 11.6%. The proportion of persons with DMRC varied from 5.7% in those with 12+ years of schooling to 14.1% among those with 9 years or less of schooling. There was a steep gradient in the proportion of participants with DMRC across self-rated health with 5.9% in those reporting very good health to 20% in those reporting poor/very poor health. A steep gradient was seen for proportion of participants with DMRC across physical activity group from 15.7% in those with low activity to 6.6% in those with high activity.
A steep gradient was also seen for proportion of participants with DMRC across BMI-groups from 3.3% in underweight persons to 18.2% in obese persons, Table 3. For hypertension, the DMRC proportion was 6.0% for those with normal blood pressure to 12-13% among those with different grades of hypertension.
In the training dataset, the univariate logistic regression models with one explanatory variable at a time, all explanatory variables, except marital status, were found to be statistically significant, and consequently all but one of the variables described above were included in the multiple logistic regression model developed on the training dataset.
Stepwise regression was applied to reduce the model to include only For model 1, the AUC for the test dataset was AUC = 0.685; for model 2 it was AUC = 0.711, and for model 3 and 4 it was AUC = 0.771 and 0.772, respectively. A sensitivity of 50%, resulted for model 1 in a specificity of 72%; for model 2 in a specificity of 75%; for models 3 and 4 in a specificity of 84%, Fig. 1 and Supplementary Figures S1, S2, S3 and S4. All the sensitivity analyses performed showed similar results (data not shown).

Main findings
People with prevalent prediabetes or undiagnosed, poorly controlled or potentially sub-regulated diabetes need intervention from the health care system to prevent possible long-term complications. In a population, where this group of people constituted about 10% of the adult population, we found an AUC of almost 0.77 for the prediction of prevalent DMRC with a model based on six easily obtainable characteristics; age, self-rated health, smoking status, and measured BMI, waist-to-hip ratio, and pulse rate.

Other studies
In a systematic review of newer prediction models for type 2 diabetes in the general population, Asgari et al (Asgari et al., 2021) identified 22 models for the detection of undiagnosed diabetes. As risk factors, all 22 models, apart from one, included age; family history of diabetes in 15 models, hypertension/blood pressure and waist-to-hip ratio in 14 models; BMI in 12 models; sex in 10 models, and other risk factors more rarely. The median (interquartile range) AUC was 0.77 (95% CI 0.74-0.81). There is a notable overlap between the risk factors included in the models review by Asgari et al. and the risk factors included in our final model, as we had also age, sex, waist-to-hip ratio, and BMI. Blood pressure was tested out of our final model 3, while pulse rate remained,  Only two of the models reviewed by Asgari et al. used European data. Stiglic et al. (Štiglic et al., 2018) developed a model for Slovenia reaching an AUC = 0.85, but this model was one of the few including biomarker data in the form of history of high blood glucose. Gray et al. (Gray et al., 2013) developed a model for Portugal including only age, sex, BMI and current hypertension as predictors. An AUC = 0.74 was found for prevalent type 2 diabetes. One previous Danish model (Glümer et al., 2004) resembled ours as the purpose was detection of prevalent, unknown diabetes. The model built on the Inter99 study with data collected in the south-western Copenhagen County from 1999 to 2001. Half of the data were used for model training, and the other half for testing. Age, sex, BMI, hypertension, physical activity and parents' history of diabetes were predictors in the final model, AUC = 0.76 in the testing dataset, and AUC = 0.80 in an independent dataset collected via general practitioners in 1999 in Aarhus, Denmark (Glümer et al., 2004).

Strengths and limitations
Our study population derived from a population-based health survey. Data came from public registers, self-administered questionnaires and clinical non-invasive examinations. We included the three sets of data stepwise in the analysis. In this process, some variables changed status, e.g., sex was a highly, statistically significant variable when only register-and questionnaire data were included in the analysis, but sex became statistically insignificant when also clinical data were taken into account, reflecting difference between men and women in the clinical data.
We did not have data on family history of diabetes, the most widely used variable in other models for prediction of prevalent diabetes. We had data on blood pressure, another widely used variable, but in our model this variable was statistically non-significant in the multivariate analysis. Nevertheless, the AUC = 0.77 in our model was at the level of the average AUC = 0.77 found by Asgari et al. (Asgari et al., 2021) for models predicting prevalent diabetes, and at the level of AUC = 0.76 to 0.81 found by Kengne et al. (Abbasi et al., 2012) for models predicting incident diabetes based on European data.
A potential limitation at the population-basis for applicability of our model is the fact that only 36% of the invited persons participated in the health survey. It should though be taken into account that when selectivity in participation was investigated half-way through the survey , only a moderate gradient was seen in nonparticipation by e.g. education; relative risks from 1 in high, to 1.13 in medium, to 1.38 in low. However, when this gradient is combined with the gradient in prevalence of DMRC across education among the study participants, we may anticipate that the true prevalence of DMRC in the Lolland-Falster population is higher than the recorded 10%.

Clinical implications
In investigating performance of 12 models for prediction of incident diabetes in datasets from eight European countries, Kengne et al. concluded that "model performance differed across countries and no model outperformed the others enough to be uniquely recommended". This finding seems very relevant also when it comes to models for prediction of prevalent diabetes or diabetes-related conditions. Our model based on data from Lolland-Falster, a rural-provincial, health disadvantaged area of Denmark, reached an AUC = 0.77 for prediction of prevalent DMRC. However, our model did not include family history of diabetes; the most widely used predictor in other models (Asgari et al., 2021), and the lack of this variable might be a limitation for a wider applicability. Nevertheless, besides locally the model might be useful in similar settings where data are not available for construction of local models.
Our model is very user-friendly. In Denmark, age, in terms of date of birth, is included in the personal identification numbers used universally in healthcare. Data on self-rated health and smoking status can be obtained from very simple questions with 3-4 response categories. BMI, waist-to-hip ratio and pulse rate can be measured by any person in health care and potentially by the person him/her-self. The model could form the basis for screening for prevalent DMRC. Exactly where to set the possible cut-off points for the model variables in a screening program depends on the chosen benefit-to-harm ratio. In the model, a sensitivity of 50% corresponded to a specificity of 84%. This means that at the cutoff points, where half of the prevalent DMRC cases will be detected, 16% of those identified as possible cases will not have DMRC. In this potential DMRC-screening, participants testing positive will be recommended to have a blood-sample taken for measurement of HbA1c. For participants Table 4 P-values for the four multiple regression models when applied on the training set.
with HbA1c levels < 42 mmol/mol (<6.0%), their screening test will be false positive, and no further action will be taken. This might be an acceptable harm to pay.

Conclusions
Based on health survey data from a rural-provincial part of Denmark, we found that the presence of diabetes mellitus related conditions in need of intervention could be predicted from age, self-rated health, smoking status, BMI, waist-to-hip ratio and pulse rate. In this simple model, an AUC of 0.77 was found, which is in line with AUC-values of diabetes prediction models including data on family history of diabetes. As found for other diabetes prediction models, our model might not be generally applicable but useful in the local and similar settings.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.